The CHIL RT07 Evaluation Data
نویسنده
چکیده
This paper describes the CHIL 2007 evaluation data set provided for the Rich Transcription 2007 Meeting Recognition Evaluation (RT07) in terms of recording setup, scenario, speaker demagogic and transcription process. The corpus consists of 25 interactive seminars recorded at five different recording sites in Europe and the United States in multi-sensory smart rooms. We compare speakers’ talk-time ratios in the interactive seminars with lecture data and multi-party meeting data. We show that the length of individual speaker’s contributions helps to position interactive seminars between lectures and meetings in terms of speaker interactivity. We also study the differences between the manual transcription of narrow-field and far-field audio recording.
منابع مشابه
The LIMSI RT07 Lecture Transcription System
A system to automatically transcribe lectures and presentations has been developed in the context of the FP6 Integrated Project CHIL. In addition to the seminar data recorded by the CHIL partners, widely available corpora were used to train both the acoustic and language models. Acoustic model training made use of the transcribed portion of the TED corpus of Eurospeech recordings, as well as th...
متن کاملThe IBM Rich Transcription 2007 Speech-to-Text Systems for Lecture Meetings
The paper describes the IBM systems submitted to the NIST Rich Transcription 2007 (RT07) evaluation campaign for the speechto-text (STT) and speaker-attributed speech-to-text (SASTT) tasks on the lecture meeting domain. Three testing conditions are considered, namely the multiple distant microphone (MDM), single distant microphone (SDM), and individual headset microphone (IHM) ones – the latter...
متن کاملThe IBM RT07 Evaluation Systems for Speaker Diarization on Lecture Meetings
We present the IBM systems for the Rich Transcription 2007 (RT07) speaker diarization evaluation task on lecture meeting data. We first overview our baseline system that was developed last year, as part of our speech-to-text system for the RT06s evaluation. We then present a number of simple schemes considered this year in our effort to improve speaker diarization performance, namely: (i) A bet...
متن کاملData Collection for the CHIL CLEAR 2007 Evaluation Campaign
This paper describes in detail the data that was collected and annotated during the third and final year of the CHIL project. This data was used for the CLEAR evaluation campaign in spring 2007. The paper also introduces the CHIL Evaluation Package 2007 that resulted from this campaign including a complete description of the performed evaluation tasks. This evaluation package will be made avail...
متن کاملSpeaker diarization for meeting room audio
This paper describes a speaker diarization system in 2007 NIST Rich Transcription (RT07) Meeting Recognition Evaluation for the task of Multiple Distant Microphone (MDM) in meeting room scenarios. The system includes three major modules: data preparation, initial speaker clustering and cluster purification/merging. The data preparation consists of the raw data Wiener filtering and beamforming, ...
متن کامل